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ABSTRACT 


A linear -predictive  form  of  the  quadratic  classifter  (the  optimal  decision  rule 
for  Gaussian  random  processes)  is  developed  and  applied  to  the  discrimination 
and  classification  of  radar  target  signatures.  The  classifier  wrb  devised  to  im- 
plement a sequential  probability  ratio  test  (SPRT);  that  is,  consecutive  radar 
returns  are  observed  until  the  target  can  be  classified  with  a prescribed  proba- 
bility of  error.  Because  of  the  linear -predictive  formulation,  the  computational 
and  storage  requirements  for  the  classifier  arc  related  only  to  the  number  of 
returns  necessary  to  predict  the  signature  and  not  to  the  length  of  signature  ob- 
served; a classifier  with  modest  storage  and  computational  requirements  can  be 
employed  to  classify  signatures  consisting  of  an  arbitrarily  large  number  of 
radar  returns.  The  classifier  is  related  to  several  results  in  mean-square  fil- 
tering theory  and  has  an  interpretation  in  terms  of  the  maximum  entropy  and 
maximum  likelihood  spectral  estimates  for  the  target  signatures. 
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FOREWORD 


This  report  describes  a decision  rule  structure  for  the  discrimination  of  two 
discrete-time  random  processes.  The  structure  is  recursive  and  its  discovery 
was  motivated  by  a desire  to  perform  sequential  discrimination  of  targets  tracked 
by  a radar.  For  sequential  discrimination,  an  object  is  illuminated  with  consecu- 
tive radar  pulses  until  a classification  of  the  object  can  be  made  to  within  a pre- 
determined probability  of  error. 

The  approaoh  taken  brings  some  results  from  estimation  theory  (in  particular, 
linear  prediction)  to  bear  on  the  discrimination  problem.  The  structure  of  the 
decision  rule  can  be  related  to  estimator  /correlator  receiver  realizations  for 
random  signal  detection  problems  and  both  formulations  have  some  common  im- 
portant characteristics.  Because  the  linear  prediction  can  often  be  performed 
with  a relatively  short  history  of  the  process,  only  modest  amounts  of  fc  tor  age  and 
computational  resources  are  required  - even  when  the  process  is  observed  over  a 
long  time  interval. 

The  sequential  decision  rule  described  here  has  been  Implemented  in  real  time 
and  operates  in  conjunction  with  the  TRADEX  radar  at  Kwajaleln,  M.I.  The  real- 
time implementation  has  been  given  the  aoronym  LEAD  for  Linear  Estimation  ijnd 
fiiscrimination. 

The  report  is  written  In  the  context  of  the  radar  discrimination  problem  and  is 
presented  in  a largely  tutorial  format.  It  should  be  noted,  however,  that  the 
results  are  applicable  to  target  discrimination  involving  other  types  of  sensors 
(sonic,  optical,  etc.)  and  to  other  similar  detection  and  identification  problems  as 
well.  The  reader  with  a basic  knowledge  of  statistical  decision  procedures  will 
find  that  the  results  are  derived  in  a straightforward  manner  and  that  the  report 
la  self-contained.  Several  references  are  provided  for  the  related  material  on 
mean-square  filtering  and  spectral  analysis. 
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THE  APPLICATION  OF  LINEAR  PREDICTION 
TO  SEQUENTIAL  CLASSIFICATION  OF  RADAR  TARGET  SIGNATURES 


I.  INTRODUCTION 

This  report  describes  a new  form  of  quadratic  classifier  (the  optimal  decision  rule  for 
Gaussian  random  processes)  that  is  being  used  for  discrimination  and  classification  of  radar 
target  signature s.*  The  classifier  is  sequential  in  nature  and  is  well  suited  for  real-time  oper- 
ation. The  signatures  are  observed  a few  returns  at  a time  and  a classification  may  be  made 
after  any  number  of  observations.  Since  most  forms  of  the  quadratic  classifier  can  be  applied 
only  to  signatures  consisting  of  some  fixed  number  of  returns,  this  sequential  form  provides 
the  following  advantages: 

(a)  Signatures  of  targets  that  are  easy  to  discriminate  are  classified  with  a 
relatively  small  number  of  returns.  This  saves  time  that  would  be  wasted 
if  it  were  required  to  observe  the  signature  for  a larger,  fixed  number  of 
returns,  and  permits  a greater  number  of  such  targets  to  be  discriminated 
in  a given  real-time  interval. 

(b)  Signatures  of  targets  that  are  diffiqult  to  discriminate  can  be  observed  for 
an  arbitrarily  large  number  of  returns  before  classification.  This  capa- 
bility, which  derives  from  our  special  formulation  of  the  sequential  deci- 
sion rule,  leads  to  a higher  percentage  of  correct  classifications  of  these 
targets  than  could  be  achieved  if  classifications  were  restricted  to  signa- 
tures with  a fixed  maximum  number  of  returns. 

The  sequential  classifier  has  no  greater  storage  and  computational  requirements  than  a simi- 
larly configured  non -sequential  quadratic  classifier,  and  when  both  are  applied  to  signatures 
having  the  same  number  of  returns,  classification  results  are  identical.  However,  because  it 
permits  targets  to  be  discriminated  with  a varying  number  of  observations,  the  sequential  form 
of  the  classifier  results  in  a more  efficient  use  of  radar  resources  and  an  overall  improvement 
in  discrimination  performance. 

The  sequential  decision  rule  on  which  the  classifier  is  based  is  characterized  by  a set  of 
recursive  decision  stages.  Each  stage  consists  of  the  observation  of  returns  from  one  or  more 
pulses  illuminating  the  target,  and  an  attempt  to  classify  the  target  based  on  these  and  all  pre- 
viously observed  returns.  The  process  is  repeated  through  successive  stages  until  a classifica- 
tion can  be  made  at  a prescribed  level  of  confidence. 

This  report  shows  that  a realization  of  the  sequential  decision  rule  can  be  obtained  that 
consists  of  two  basic  steps:  linear  prediction  followed  by  an  "incremental"  classification.  At 
each  decision  stage,  a linear  prediction,  based  on  previous  observations,  is  made  of  the  radar 
returns  to  be  observed  during  that  stage.  An  error  term,  formed  as  the  difference  between  the 
predicted  and  the  observed  values  of  the  signature,  is  then  applied  to  an  "incremental"  quadratic 
classifier.  The  output  of  the  incremental  classifier  is  accumulated  in  a sum  of  previous  outputs 
and  compared  to  a set  of  thresholds  to  make  a classification  decision. 


’’'The  term  "signature"  refers  to  the  time-ordered  returns  from  a set  of  consecutively  trans- 
mitted radar  pulses  illuminating  a given  target. 


i 


The  classifier  structure  suggests  two  modes  of  operation:  a normal  mode  in  which  predic- 
tion of  the  radar  signature  is  based  on  the  returns  observed  in  all  previous  decision  stages,  and 
an  extended  mode  in  which  prediction  of  the  signature  is  based  on  the  returns  observed  in  only 
some  fixed  maximum  number  of  previous  stages.  It  is  important  to  note  that  while  signature 
prediction  in  the  extended  mode  is  based  on  the  radar  returns  observed  in  only  a subset  of  the 
previous  stages,  signature  classification  is  based  on  the  returns  observed  in  all  previous  stages. 

The  extended  mode  of  operation  leads  to  important  advantages  in  computer  storage  and 
execution  time.  A sequential  classifier  with  modest  storage  and  computational  requirements 
can  achieve  the  performance  of  a non-sequential  classifier  with  much  larger  storage  and  compu- 
tational requirements.  In  addition,  since  the  computation  during  each  stage  of  extended  opera- 
tion is  constant  and  proportional  to  the  (fixed)  number  of  returns  used  for  prediction,  computa- 
tional requirements  place  no  upper  limit  on  the  ultimate  length  of  signature  that  can  be  observed. 

The  use  of  the  sequential  classifier  in  extended  mode  has  produced  some  outstanding  results. 
Two  examples  will  illustrate  these. 

Table  I shows  the  results  of  applying  a sequential  classifier  to  500  simulated  signatures 
each  of  a re-entry  vehicle  (RV)  and  a fragment  from  the  associated  missile  final  stage  (tank). 


TABLE  I 

CLASSIFICATION  RESULTS  FOR  A SEQUENTIAL  CLASSIFIER 
USED  IN  THE  NORMAL  MODE’1' 


Number 
of  Decision 
Stages 


Total  Number 
of  Signatures  Classified 

Classification 

Errors 

(percent) 

RV 

Fragment 

RV 

Fragment 

1 

78 

100.0 

0.0 

9 

116 

11.1 

0.0 

18 

155 

16.7 

0.0 

48 

182 

8.3 

0.0 

95 

215 

7.4 

0.0 

147 

231 

4.8 

0.0 

203 

260 

3.9 

0.8 

257 

288 

3.5 

0.7 

286 

297 

3.8 

1.0 

333 

317 

3.9 

1.9 

378 

325 

3.4 

2.8 

401 

335 

3.2 

3.6 

418 

341 

3.3 

3.8 

432 

345 

3.5 

3.8 

445 

3 54 

3.6 

4.5 

457 

366 

3.5 

4.9 

500 

500 

5.8 

20.4 

Average  Number 
of  Radar  Returns 


Fragment 


1.0 

1.3 

1.7 

2.1 

2.5 

2.8 

3.2 
3.7 
3.9 

4.3 


* Dual-polarized  simulated  signatures  at  L-band. 

t Remaining  signatures  classified  according  to  a single  default  threshold. 


The  classifier  was  exercised  in  the  normal  mode  (i.e.,  the  returns  from  all  previous  decision 
stages  were  used  to  predict  the  signature)  and  a single  radar  return  was  observed  at  each  deci- 
sion stage.  The  table  lists  for  each  stage  the  number  of  signatures  of  each  target  classified, 
the  cumulative  error  rates,  and  the  average  number  of  returns  used  to  classify  the  signatures. 
For  a given  signature  to  be  classified  at  a given  stage,  the  classifier  output  for  that  signature 
must  fall  outside  a pair  of  widely  separated  thresholds.  After  sixteen  decision  stages,  all  the 
unclassified  signatures  were  classified  according  to  a single  default  threshold.  It  was  found 
that  the  final  error  rates  of  5.8  and  20.4  percent  were  the  same  as  those  for  a non -sequential 
quadratic  classifier  based  on  sixteen  returns,  while  the  average  number  of  returns  used  for 
classification  was  only  9.1  for  the  RV  and  8.3  for  the  fragment.  In  other  words,  the  average 
classification  time  was  approximately  cut  in  half. 

Table  n shows  the  results  of  classifying  the  same  signatures  using  a classifier  with  con- 
siderably reduced  storage  requirements.  This  classifier  was  used  in  the  extended  mode;  pre- 
diction of  the  signature  was  based  on  a maximum  of  three  previous  returns.  The  final  error 
rates  and  average  number  of  returns  for  each  target  are  almost  identical  tc  those  in  Table  I. 
This  classifier  has  storage  and  computational  requirements  that  are  about  1/16  of  those  for  the 


TABLE  II 


CLASSIFICATION  RESULTS  FOR  A SEQUENTIAL  CLASSIFIER 
WITH  REDUCED  STORAGE  REQUIREMENTS  USED  IN  THE  EXTENDED  MODE* 


Number 
of  Decision 
Stages 

Total  Number 
of  Signatures  Classified 

Classification 

Errors 

(percent) 

Average  Number 
of  Radar  Returns 

RV 

Fragment 

RV 

Fragment 

RV 

Fragment 

1 

1 

76 

100.0 

0.0 

1.0 

1.0 

2 

11 

115 

9.1 

0.0 

1.9 

1.3 

3 

24 

149 

12.5 

0.0 

2.5 

1.7 

4 

63 

179 

6.3 

0.6 

3.4 

2.1 

5 

106 

214 

5.7 

0.5 

4.1 

2.6 

6 

166 

243 

5.4 

0.4 

4.8 

3.0 

7 

218 

263 

4.1 

0.8 

5.3 

3.3 

8 

257 

289 

3.5 

1.4 

5.7 

3.7 

9 

294 

305 

3.4 

2.3 

6.1 

H * ' 

10 

329 

318 

3.0 

2.8 

6.5 

4.2 

11 

369 

327 

2.7 

4.0 

4.4 

12 

401 

336 

2.7 

4.2 

B9 

4.6 

13 

418 

349 

3.1 

5.2 

| 7.6 

4.9 

14 

431 

353 

3.2 

5.1 

7.8 

5.0 

15 

448 

363 

3.3 

5.5 

8.1 

5.3 

16 

454 

368 

3.3 

5.7 

8.2 

5.5 

16+ 

500 

500 

J 

6.0 

L 

20.0 

8.9 

8.2 

* Dual-polarized  simulated  signatures  at  L-band. 

t Remaining  signatures  classified  according  to  a single  default  threshold. 


previously  cited  non -sequential  classifier;  yet  it  obtained  the  same  error  rates  and  classified 
the  targets  with  approximately  half  the  number  of  returns. 

As  a second  example,  a sequential  classifier  was  applied  to  the  signatures  of  an  experi- 
mental RV  and  one  of  its  decoys.  Both  objects  were  contained  in  a cloud  of  chaff.  Figure  i 
shows  the  operating  characteristic  (plot  of  RV  error  rate  vs  decoy  error  rate)  for  sequential 
and  non -sequential  quadratic  classifiers  with  identical  storage  requirements.  The  non- 
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Fig.  1.  Comparison  of  performance 
of  sequential  and  non-sequential  qua- 
dratic classifiers. 
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sequential  classifier  observed  ten  returns  of  each  signature.  The  sequential  classifier,  applied 
in  the  extended  mode,  observed  an  average  number  of  returns  that  varied  between  points  on  the 
operating  characteristic,  but  which  was  generally  in  the  range  of  25  to  35.  The  performance 
improvement  is  clearly  significant.  For  example,  at  the  10-percent  decoy  error  (false  alarm) 
rate,  the  non-sequential  classifier  had  an  RV  error  (leakage)  rate  of  about  40  percent  while  the 
sequential  classifier  had  a leakage  rate  of  about  11  percent.  Note  that  since  the  average  number 
of  returns  observed  by  the  sequential  classifier  was  generally  in  the  range  of  25  to  3 5,  some 
signatures  were  observed  for  a larger  number  of  returns.  Thus,  the  performance  of  the  se- 
quential classifier  could  probably  not  have  been  matched  by  even  a (larger  and  computationally 
more  expensive)  non-sequential  quadratic  classifier  designed  for  signatures  with  the  average 
number  of  returns. 

These  examples  illustrate  the  advantages  that  can  be  obtained  by  using  the  special  sequential 
form  of  the  quadratic  classifier.  The  remainder  of  this  report  describes  the  classifier  in  detail 
and  discusses  its  method  of  implementation.  Section  II  develops  the  theory  of  the  classifier 
while  Sec.  Ill  deals  with  computational  issues.  Section  IV  relates  the  sequential  classifier  to 
the  current  literature  in  linear  mean-square  prediction  theory  and  provides  the  justification  for 
the  use  of  the  classifier  in  the  extended  mode.  In  addition.  Sec.  IV  provides  an  interpretation 
of  the  classifier  in  terms  of  maximum  likelihood  and  maximum  entropy  spectral  estimation. 
Finally,  Sec.  V summarizes  results. 
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II,  DEVELOPMENT  OF  THE  SEQUENTIAL  QUADRATIC  CLASSIFIER 


A.  Basic  Form  of  the  Sequential  Decision  Rule 


Two-class  sequential  decision  procedures  have  been  employed  in  a number  of  applica- 
1-4 

tions.  For  the  application  of  these  procedures  to  radar  signature  discrimination,  an  observa- 

fVi 

tion  vector  x. ^ is  formed  from  samples  of  the  radar  signature  at  the  decision  stage.  In  the 
so-called  "fixed-boundary"  sequential  probability  ratio  test  (SPRT),  the  likelihood  ratio  for  the 
observation  vector  x^  is  evaluated  and  compared  to  thresholds  T^>  Tg.  If  the  value  of  the 
likelihood  ratio  is  greater  than  T^  or  less  than  Tg,  then  Class  1 (w^)  or  Class  2 (w2)  is  decided, 
respectively. 


P(xkl^d) 
” p(-k' 


>TA~w1 


< TB  ~w2 


(1) 


Otherwise,  the  process  continues  for  another  stage.  A fixed  number  of  additional  observations 

are  made  and  appended  to  the  observation  vector;  a new  likelihood  ratio  is  formed  using  the 

conditional  multivariate  densities  for  the  larger  observation  vector,  and  the  process  continues 

until  one  of  the  boundaries  (thresholds)  is  crossed.  The  probabilities  of  misclassification  are 

1 2 

related  to  the  thresholds  by  the  following  equations.  ’ 


TB(TA-1)/(TA-TB> 


Prob(  Errorj  wd 
Prob  [ Error  |w2]  = (1  -Tg)/(TA-Tg) 


(2a) 

(2b) 


In  the  most  general  type  of  SPRT,  the  thresholds  are  not  fixed  at  constants  T A and  T~  but 

2 A £5 

change  at  each  decision  stage.  For  example,  the  thresholds  or  "boundaries"  can  be  made  to 
converge  gradually  to  a single  value  as  shown  in  Fig.  2(a).  This  forces  the  classifier  to  make 
a decision  after  some  maximum  number  of  stages  kQ.  For  the  case  of  general  time-varying 
boundaries,  Eqs.  (2)  do  not  apply. 

A special  case  of  time-varying  boundaries  is  illustrated  in  Fig.  2(b).  The  boundaries 
remain  fixed  at  T^  and  Tg  until  the  (k  )***  stage  when  both  are  changed  to  a common  intermediate 


Fig.  2.  Types  of  decision  boundaries 
for  sequential  classification. 
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value  Tq.  This  special  case  of  time-varying  boundaries  differs  from  the  fixed-boundary  case 
only  in  that  a decision  is  suddenly  forced  after  kQ  stages.  If  kQ  is  sufficiently  large  so  that  most 
observation  vectors  are  classified  before  kQ  stages,  then  Eqs.  (2)  provide  a good  approximation 
to  the  errors.  Obviously  in  any  practical  implementation,  sequential  decision  processes  cannot 
be  allowed  to  continue  indefinitely  so  that  if  it  is  desired  to  guarantee  classification  of  all  ob- 
servation vectors,  decision  boundaries  of  one  of  the  forms  of  Fig.  2 must  be  employed. 

B.  Recursive  Form  of  the  Sequential  Decision  Rule 

For  the  application  of  the  SPRT  to  radar  signature  discrimination,  the  observation  vector 
is  partitioned  as 


*k-l 


Ax  , 


(3) 


and  constructed  as  shown  in  Fig.  3.  Here  x._,  is  the  observation  vector  at  the  (k  — i)^1  stage 
and  Axk  represents  the  new  observations  at  the  k stage.  The  dimension  of  Ax.^  is  equal  to 
nAp  where  Ap  is  the  number  of  new  returns  observed  per  stage  (which  is  assumed  to  be  the  same 
for  all  stages)  and  n ) > the  number  of  scalar  items  of  information  available  from  each  return. 


■TIME 


lll-HliHl 


l PREVIOUS 
f OBSERVATIONS 


l NEW 

f OBSERVATIONS 


Fig.  3.  Construction  of  observation  vector  for  sequential 
classification  of  radar  signatures. 

In  particular,  if  the  radar  is  narrowband  and  noncoherent  with  only  the  principal  polarization 
(PP),  then  n will  be  equal  to  1.  If  the  radar  is  coherent  with  only  PP,  or  if  the  radar  is  non- 
coherent with  both  principal  and  orthogonal  polarizations  (OP),  then  n will  be  equal  to  2.  If  the 
radar  is  coherent  in  both  PP  and  OP,  then  n will  be  equal  to  4* 


''Our  formulation  models  the  coherent  signatures  from  each  polarization  as  samples  from  two 
correlated  real  random  processes  (representing  amplitude  and  phase  or  real  and  imaginary 
parts).  An  alternative  formulation  would  treat  the  signatures  as  samples  from  a single 
complex  process. 
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The  likelihood  ratio  in  Eq.  (1)  can  be  written  in  the  recursive  form 

P(^Xk|jck.1.  w1) 
ik  = P(Axk|xk_1,  w2)  ’ *k-i  • 


(4) 


Note  that  when  the  new  observations  Axk  are  independent  of  the  previous  observations  xk_t- 
the  first  term  in  Eq.  (4)  depends  only  on  Ax  k and  so  the  decision  rule  is  considerably  simplified. 
In  order  to  achieve  independence  for  signatures  described  by  Gaussian  statistics  one  can,  by  the 
use  of  Karhunen-Lofeve  analysis,  transform  the  observations  into  a coordinate  system  where 
they  are  uncorrelated  and  therefore  independent.  This  is  the  approach  taken  in  Refs.  3 and  4. 
Our  approach  is  somewhat  different.  It  will  be  shown  that  when  the  observations  are  jointly 
Gaussian,  but  not  necessarily  independent,  one  can  separate  the  decision  rule  into  a step  con- 
sisting of  a linear  estimation  of  the  new  observations  and  a step  involving  a likelihood  ratio  test 
based  on  the  prediction  errors.  It  can  further  be  shown  (see  Sec.  IV)  that  this  is  essentially  a 
different  path  to  achieving  independence.  The  prediction  errors  used  for  classification  are 
uncorrelated  and  represent  only  the  new  or  'innovations*  information  present  in  the  new 
observations. 


C.  Decision  Rule  for  Jointly  Gaussian  Observations 

If  the  observations  xk  are  jointly  Gaussian,  then  the  probability  densities  in  Eq.  (1)  are 
given  by 

■XSkK'  ' | Ki|Vi  : ‘ = <-2  <5»> 

with 


Qk  = (xk-mk)T  <K‘f*  (xk~mk)  (5b) 

where  m !.  is  the  mean  vector  and  is  the  covariance  matrix  for  the  observation  vector  x , of 
■ th  K i K k 

the  i class.  Q.  is  the  squared  Mahalanobis  distance  between  the  observation  vector  and  the 
^ tjl 

mean  vector  of  the  i class  and  is  explicitly  defined  here  because  of  its  importance  in  deriving 
the  ensuing  results. 

The  conditional  probability  densities  appearing  in  Eq.  (4)  for  the  new  observations  Axk  can 
be  derived  by  first  noting  that 


p(A*k!*k-i'  wi> 


P^k-Sk-iK* 

P(2£k-il«i> 


P(5kl  w.) 

P&k-ilwi> 


(6) 


Thus,  substituting  Eq.  (5)  one  obtains 


P(A*kl*k-i’  wi>  = 


(^2?)taAP  }K*j1/2  6XPl  2 Qkl 

\ Ki^\i/z  expf-  i Qk-iJ 


1 

(^)nApiK1;ii/2/iK1;„1 


jI72  expf“  7 AQkl 


t-. 
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where 


Let  the  mean  vector  and  the  covariance  matrix  be  partitioned  to  correspond  to  the  partitioning 
of  the  observation  vector,  i.e.. 


— k = E 


mk-l 


K,  = E 


Kk-1 

i 

i W 
| w 

1 

M 

Dk 

It  is  shown  in  Appendix  A that  the  corresponding  inverse  covariance  matrix  can  be  expressed 
in  partitioned  form  as 


K-i  tGkEk‘GkT  I -GkEk"' 


F_1rT 
Ek  Gk 


lEk_1 


Gk  = Kk-*1 


Ek^k-^Kk'-lBk  • 

It  is  further  shown  in  Appendix  A that  AQk  is  of  the  form 
AQk  = (Ask  -iik)T  Ek_1(Axk-Hk) 

where 


and  that 


J±k  = Affik+Gk^k.i  -ffik-i) 


" IK,, 


’’•■In  order  to  reduce  the  complexity  of  notation,  the  class  index  superscripts  (i)  will  be  dropped 
in  Eqs.  (9)  through  (15).  It  should  be  understood  that  the  results  apply  to  both  classes. 
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(16) 


By  substituting  Eq.  (15)  into  Eq.  (7),  one  obtains 

«i>  - 

where  AQ^  is  given  by  Eq.  (13).  This  shows  that  the  conditional  density  of  Ax  R is  a Gaussian 
density  with  mean  vector  p R and  covariance  matrix  ER. 

That  the  conditional  density  p(Axk|xk_1)  is  Gaussian  when  p(xR)  is  Gaussian  is  well-known 
(see  Ref.  5,  Appendix  to  Chap.  7).  The  mean  of  the  conditional  density  jxR,  regarded  as  a func- 
tion of  the  previous  observations  xR_^,  is  known  in  statistics  as  the  regression  of  AxR  with 
respect  to  j and  it  is  the  maximum  a posteriori  estimate  of  Ax  R.  What  is  most  important 
here  is  that  the  mean  of  the  conditional  density  for  Gaussian  processes  is  a linear  function  of 
the  previous  observations  (Eq.  (14)].  Therefore,  the  sequential  classification  algorithm  can  be 
decomposed  into  two  simple  steps;  namely  (1)  a linear-predictive  estimation  of  the  new  observa- 
tions and  (2)  an  evaluation  of  the  likelihood  ratio  for  the  conditional  densities  [Eq.  (4)].  The 
dimensionality  of  the  vectors  and  matrices  employed  in  this  second  step  is  nAp.  i.e.,  it  is  the 
dimension  of  just  the  new  observations  at  each  stage. 


D.  Interpretation  as  Sequential  Quadratic  Classifier 

One  can  interpret  the  decision  rule  for  non-sequential  likelihood  ratio  tests  when  the  proba- 
bility densities  are  Gaussian  as  a quadratic  classifier.  That  is,  the  likelihood  ratio  test  is 
interpreted  as  forming  a second-order  polynomial  boundary  in  the  observation  space  between 
the  two  regions  in  which  observation  vectors  are  classified  as  Class  1 or  as  Class  2.  As  such, 
quadratic  classifiers  are  employed  to  classify  observation  vectors  even  with  non-Gaussian 
statistics;  one  considers  the  classifier  as  defining  a second-order  decision  boundary  which  is 
fitted  to  the  first  and  second  moments  of  the  observation  vectors.  The  SPRT  for  Gaussian 
statistics  can  also  be  interpreted  as  a (sequential)  quadratic  classifier.  In  this  situation, 
however,  the  observation  space  and  the  decision  boundary  increases  in  dimensionality  with  each 
successive  stage. 

Let  hR  represent  minus  twice  the  log  of  the  likelihood  ratio  at  the  k^1  stage.  Then,  from 
Eqs.  (i)  and  (5)  the  sequential  decision  rule  is 


+ In 


< -2  lnTA  — 

> ~Z  1x1  tB  w2 


(17) 


1 2 

where  Q,  and  Q.  are  the  squared  Mahalanobis  distances  defined  by  Eq.  (5b).  The  left  side  of 
Eq.  (17)  has  the  same  form  as  a standard  quadratic  classifier  for  observation  vectors  at  the  k 
stage.  The  classifier  can  be  expressed  in  a recursive  form  by  applying  Eqs.  (8)  and  (15)  to 
Eq.  (17)  or  by  applying  Eqs.  (4)  and  (16)  to  Eq.  (1).  The  result  is 


(18a) 

(18b) 
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where  aQ|J  and  AQ^  are  defined  by  Eq.  (13)  and  where  at  the  start  of  the  recursion  one  has 
Ej  = and  hQ  = 0.  Note  that  in  the  sequential  form  of  the  classifier  we  need  to  deal  with  only 
an  incremental  classifier  [Eq.  (18b)]  at  each  stage.  The  dimension  of  vectors  and  matrices  in 
the  incremental  classifier  is  equal  to  nAp.  The  change  in  the  squared  Mahalanobis  distances 
AQ^  at  each  stage  involves  a linear  prediction  of  the  new  observations  Ax  ^ from  the  previous 
observations  3^  ^ as  discussed  earlier.  Equations  (13)  and  (14),  used  to  compute  the  AQ^ 
can  be  written  in  an  alternative  form  as 

(19a) 
(19b) 
(19c) 

where 

— k-1  “ — k-1 

(2°) 

AXk-Am^  . 

and  where  Ax^  = 0.  The  prediction/classification  sequence  is  very  clear  from  this  formulation 
with  the  mean  removed.  At  each  stage  in  the  classification  sequence  a prediction  Ax^  of  the 
new  observations  is  made  by  Eq.  (19c).  An  error  term  is  then  formed  in  Eq.  (19b)  and  this 
is  used  in  Eq.  (19a)  to  evaluate  the  change  in  the  squared  Mahalanobis  distance  AQ^.  These 
changes  are  then  used  in  the  incremental  classifier  decision  rule,  Eqs.  (18). 

E.  Extension  of  Results  to  Long  Signatures 

If  the  statistics  of  the  radar  signatures  to  be  classified  do  not  change  over  an  extended 
period  of  time,  then  one  would  expect  that  there  would  be  some  number  kg  — 1 of  previous  ob- 
servations called  the  prediction  order  for  which  the  estimate  ^ of  the  new  observations  would 
be  no  worse  than  it  would  be  if  the  estimate  were  based  on  an  infinite  number  of  previous  obser- 
vations. That  this  is  in  fact  true  is  shown  in  Sec.  IV.  Consequently,  one  can  base  the  prediction 
of  the  radar  signature  on  the  observations  acquired  in  only  the  previous  kg  — 1 stages,  but  con- 
tinue to  exercise  the  classifier  for  any  number  of  stages  kQ  > kg.  By  operating  the  sequential 
classifier  in  this  "extended  mode,"  one  can  obtain  the  approximate  performance  of  a classifier 
with  much  larger  computational  and  storage  requirements.  In  fact,  one  gains  an  ability  to 
process  long  signatures  that  for  reasons  of  excessive  computation  or  storage  requirements 
could  not  be  processed  with  other  classifiers.  The  benefits  of  operating  a sequential  classifier 
in  the  extended  mode  already  have  been  demonstrated  in  Sec.  I. 

One  can  imagine  that  the  sequential  classifier  views  a signature  through  a sliding  window 

that  never  exceeds  (k  — 1)  Ap  radar  returns  in  length.  Since  the  classifier  output  [h,  ] at  the 

th  s 14 

k stage  is  an  accumulation  of  results  from  all  previous  stages  [Eqs.  (18)],  the  classification 
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decision  is  based  on  all  returns  seen  by  the  classifier  up  through  the  k.  stage,  even  if  some  of 
the  earlier  returns  are  currently  outside  the  window.  In  this  way  the  sequential  quadratic  clas- 
sifier is  able  to  obtain  the  effect  of  a much  larger  classifier  at  a fraction  of  the  overhead.  The 
next  section  discusses  some  of  the  practical  issues  of  storage  and  computational  requirements 
in  detail. 

III.  COMPUTATIONAL  CONSIDERATIONS 

A.  Computation  and  Storage  of  Classifier  Parameters 

The  equations  that  implement  the  sequential  classifier  [Eqs.  (18)  through  (20)]  involve  a set 
of  (fixed)  matrix  parameters  Gk  and  E^  which  are  derived  from  partitions  of  the  covariance 
matrices  for  the  two  classes  of  signatures.  The  covariance  matrices  [as  well  as  the  mean  vec- 
tors which  appear  in  Eq.  (20)]  can  be  estimated  from  a set  of  training  signatures  for  each  of  the 
two  classes. 

Computation  of  the  and  the  is  carried  out  prior  to  any  real-time  operation  of  the 
classifier.  In  fact,  since  the  classifier  operation  involves  not  E^  itself  but  its  inverse  and  its 
determinant,  these  latter  quantities  also  are  computed  and  stored  prior  to  any  real-time  opera- 
tion. The  parameters  of  the  classifier  can  be  conveniently  stored  as  a symmetric  matrix 


I—  nAp  -4—  nAp  nAp  H 


I*-  nAp  — | 


(E*)'1 


k nAp 

k I 


R)-‘i 


and  a set  of  vectors 


dk  = In 
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and 


(23) 
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from  which  the  mean  vectors  for  any  stage  can  be  obtained  as  a partition  (as  shown).  A special 
form  of  storage  for  the  matrices  P1  of  Eq.  (21)  has  been  developed  for  use  with  FORTRAN  so 
that  the  matrix  partitions  can  be  accessed  directly  and  used  with  matrix  manipulation  programs 
from  the  IBM  Scientific  Subroutine  Packaged  This  special  form  of  storage  is  described  in 
Appendix  B. 

The  matrices  E,1  are  equal  to  the  first-stage  covariance  matrices  K,1.  The  matrices 
and  for  k > 1 are  computed  from  the  partitions  of  the  km  stage  covariance  matrices  via 
Eqs.  (11)  and  (12).  If  Eq.  (10)  is  also  employed,  then  a convenient  recursive  algorithm  results 
for  computation  of  the  classifier  parameters.  One  begins  with  the  full-stage  covariance  ma- 
trices partitioned  in  a form  corresponding  to  the  partitioning  of  the  P1. 


(24) 


At  the  first  step  of  recursion,  = E^  is  inverted  (by  any  conventional  procedure)  and  stored 
in  P1 . The  determinant  j E^|  which  is  usually  obtained  as  a by-product  of  the  inversion  is  re- 
tained and  used  to  form  the  first  element  of  the  vector  d.  At  the  second  step  of  recursion 
G7  and  E_  are  computed  from  Eqs.  (11)  and  (12).  The  matrix  E,1  is  inverted  and  the  results 
((E2)  and  C2)  are  stored  in  P . The  determinant  | E2|  is  retained  and  employed  to  form  the 
second  element  of  the  vector  d.  Equation  (10)  is  then  used  to  compute  the  inverted  second- 
stage  covariance  matrix  (K2)-1  which  replaces  the  partitions  Kj,  B2,  and  D*  of  Eq.  (24)  in 
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storage  and  the  process  continues.  At  the  It  step  of  recursion,  one  has  stored  the  partially 
inverted  covariance  matrix 


! ! 

! ! 

(Kk-i*'1  i ®k  i 

i 1 

i i 

Bk 

i i 

s 

! d*  j 
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from  which  one  can  readily  proceed  to  compute  the  partitions  and  E^  of  P1  and  the  inverse 

covariance  matrix  needed  for  the  next  step.  The  partitioned  parameter  matrices  P1,  Jie  full- 

stage  mean  vectors  m ^ , and  the  determinant  vector  d.  completely  define  the  classifier. 

s 

B.  Real-Time  Computation  and  Storage  Requirements 

Storage  requirements  for  a sequential  classifier  that  is  designed  to  observe  a maximum  of 

kgAp  radar  returns  in  the  normal  mode  of  operation  are  the  same  as  those  for  a non- sequential 

quadratic  classifier  using  that  same  number  of  returns.  The  primary  storage  allocation  is  for 

two  symmetric  matrices  of  dimension  k nAp  which  requires  k nAp  • (k  nAp  + i)  words. 

s s s 


Computed  Item 


^k  = <Gk>T4-i 


lk  - ^k  ^k 


LT  /T71i,-1  _ i 


TABLE  HI 

COMPUTATIONS  TO  EVALUATE  AQ£ 


(Eg  e 


Multiplications 


(k  - 1)  (nAp)2 


nAp(nAp  + 3) 


Additions  or  Subtractions 


[(k  - i)  nAp  - i]  nAp 
nAp 

nAp(nAp  + i) 


The  central  processor  time  required  to  execute  the  classifiers  in  real  time  can  be  approxi- 

1 7 

mated  by  the  time  required  to  compute  the  squared  Mahalanobis  distances  Q,  and  Q,  or  their 

jL  2 ^ ^ ■ 

changes  AQ.  and  AQ,  . Table  III  lists  the  number  of  computations  required  to  evaluate  AQ,1  at 
th  K K k 

the  k stage.  Both  the  number  of  additions  and  the  number  of  multiplications  per  stage  are  ap- 

2 2 

proximately  equal  to  k(nAp)  for  k<  k and  equal  to  k (nAp)  thereafter.  In  other  words,  for  the 

s s 

normal  mode  of  operation  [k^  k ],  the  computation  is  proportional  to  k and  to  the  square  of  n 
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and  Ap.  If  the  number  of  arithmetic  operations  is  summed  over  all  stages  up  to  the  (kg)  , 
the  total  number  of  multiplications  is  kgnAp(kgnAp  + 3)/2  and  the  number  of  additions  is 
k nAp(k  nAp  + l)/2,  both  of  which  are  identical  to  the  number  of  operations  required  to  eval- 
uate a non-sequential  classifier  of  corresponding  size.  Thus,  the  total  computational  effort 
is  the  same  for  both  types  of  classifiers  when  applied  to  signatures  with  the  same  number  of 
returns.  However,  since  the  number  of  computations  performed  per  stage  by  the  sequential 
classifier  is  distributed  so  that  it  increases  linearly  with  the  number  of  stages,  the  real-time 
computational  resources  set  aside  for  classifier  evaluation  in  the  normal  mode  of  operation 
must  be  large  enough  to  permit  evaluation  at  the  final  most  computationally  expensive  stage. 

This  has  the  effect  of  doubling  the  total  real-time  computational  resources  that  must  be  allocated 
to  the  classifier,  although  in  the  early  decision  stages  this  time  is  not  used.  While  this  may  at 
first  seem  wasteful,  more  careful  consideration  of  the  application  usually  will  show  that  it  is 
p jssible  to  ration  the  unused  time  to  lower  priority  computer  tasks  not  related  to  classifier 
evaluation. 

In  the  extended  mode  of  operation,  all  the  allocated  time  is  used.  Further,  since  the  com- 
putational requirements  at  each  stage  are  constant  [proportional  to  kg(nAp)  ],  there  is  no  limit 
to  the  number  of  returns  that  may  be  observed  in  order  to  classify  a target. 

IV.  RELATIONS  TO  MEAN-SQUARE  FILTERING  AND  SPECTRAL  ANALYSIS 

The  sequential  classifier  as  formulated  in  this  report  is  intimately  related  to  certain  prob- 
lems in  mean-square  filtering  and  spectral  estimation.  It  is  in  this  context,  in  fact,  that  the 
classifier  assumes  its  greatest  significance  and  a better  understanding  of  the  classifier  opera- 
tion in  both  the  normal  and  the  extended  mode  emerges. 

In  order  to  relate  the  sequential  classifier  to  the  important  literature  in  mean-square  fil- 
tering and  spectral  analysis,  we  will  restrict  our  attention  for  the  most  part  to  the  case  where 
the  radar  signature  has  only  one  component  (i.e.,  PP)  and  the  number  of  returns  observed  at 
each  stage  is  one.  Thus,  the  signatures  can  be  modeled  by  single  (univariate)  discrete-time 
random  processes  which  we  will  generally  assume  to  be  stationary.  None  of  these  special 
restrictions  are  absolutely  necessary;  the  results  can  be  formulated  to  include  the  more  gen- 
eral case  treated  in  the  earlier  portion  of  this  report.  However,  generalization  serves  only  to 
complicate  the  algebra  and  adds  very  little  to  the  understanding  of  the  basic  relations. 

It  should  be  observed  that  whereas  much  of  the  literature  referred  to  in  this  section  is 
couched  in  terms  of  temporal  averages  of  the  random  processes,  we  shall  adhere  to  our  use  of 
ensemble  averages.  This  should  pose  no  particular  problems  as  far  as  the  results  are  con- 
cerned, since  they  can  be  formulated  from  either  point  of  view.  In  addition,  it  will  be  assumed 
throughout  that  the  mean  of  the  signatures  has  been  removed  so  that  the  random  processes  can 
be  treated  as  zero-mean  processes. 

This  section  begins  by  showing  that  the  predicted  estimate  Ax^.  of  Eqs.  (19)  is  the  optimal 
linear  mean-square  estimate  of  the  random  process  (the  signature)  regardless  of  the  process 
statistics.  For  this  result,  none  of  the  special  restrictions  cited  earlier  except  the  zero-mean 
condition  are  applied.  Next,  it  is  shown  that  when  the  processes  are  univariate  and  stationary, 
the  set  of  recursive  relations  [Eqs.  (10)  through  (12)]  used  to  compute  the  classifier  parameters 

7 

reduce  to  the  classical  recursive  relations  originally  formulated  by  Levinson.  The  sequential 
classifier  is  then  interpreted  in  a canonical  form  involving  a pair  of  linear-predictive  filters 
and  an  optimal  classifier  for  two  white  Gaussian  noise  processes.  Finally,  the  classifier  is 
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interpreted  in  terms  of  the  maximum  likelihood  and  maximum  entropy  spectral  estimates  for 
the  processes. 

A.  Linear  Mean-Square  Prediction  of  the  Radar  Signatures 

The  problem  to  be  considered  here  is  that  of  determining  the  best  linear  estimate  Ax , of  the 

th  ^ 

observations  Axk  at  the  k stage  from  the  set  of  previous  observations  xk_^  in  the  sense  that 

S2  = E[|Axk  - Axk]2]  (26) 

is  minimum.  The  (arbitrary)  linear  estimate  can  be  written  in  the  form 

^k  = Gk*k-l  • W 


rp 

It  will  be  shown  that  the  particular  linear  transformation  Gk  required  to  minimize  Eq.  (26)  is 
given  by  Eq.  (11 ). 

In  preparation  for  this  result,  we  first  define  the  estimation  error  e k as 


Ak  = AXk-Axk 

and  state  the  following  basic  theorem  of  mean-square  estimation  which  is  known  as  the 
orthogonality  principle.  The  proof  of  the  theorem  is  given  in  Appendix  C. 

Theorem:  Let  the  estimate  Axk  be  defined  so  that  the  error 
e k is  orthogonal  to  the  observations  x k_^;  i.e.,  E[ekxk 
is  a matrix  of  zeros.  Then  Axk  minimizes  Eq.  (26)  and  the 
minimum  mean-square  error  is  given  by 


(28) 


& ‘ 


mn 


E[e.  Ax , ] , 


(29) 


This  theorem  can  be  used  to  show  that  the  optimal  prediction  matrix  Gk  is  given  by  Eq.  (11). 
The  theorem  requires  that 

E^k*k-1*  = [ 0 j • (30) 

By  substituting  Eqs.  (27)  and  (28)  in  Eq.  (30)  and  employing  the  definitions  in  Eq.  (9b),  one  has 


E[(Ask-GkT^k-l)2j.il=BkT-GkTKk_1=[  0 ] 


(31)* 


from  which  Eq.  (11)  follows  directly.  By  substituting  the  same  equations  in  Eq.  (29),  one  can 
obtain  an  expression  for  the  minimum  mean-square  error: 

*min  = Et^k  = E[tr  €kAxJ} 

= E[tr(Axk  -G^’xk_1)  ax  J] 


= tr<Dk~Gk  Bk> 


(32) 


* Equation  (31)  represents  the  Yule-Walker  equations  in  statistical  estimation5’8  or  the  normal 
equations  in  regression  analysis.8 
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Then,  by  virtue  of  Eqa.  (11)  and  (12),  Eq.  (32)  reduces  to 


= tr  E, 


(33) 


That  is,  the  minimum  mean-square  error  is  the  trace  of  the  conditional  covariance  matrix. 

The  structure  of  the  sequential  classifier  can  be  depicted  as  in  Fig.  4.  The  classifier  con- 
sists of  a pair  of  (discrete-time)  linear  mean-square  predictive  filters  each  of  which  is  designed 

1 

to  estimate  one  of  the  processes  to  be  classified.  The  two  resulting  error  processes  c,  and 
2 * 

« k are  then  fed  into  a box  that  implements  Eqs.  (18)  and  which  represents  a quadratic  classifier 


|u-»-wo-;| 


Fig.  4.  Structure  of  sequential  classifier. 


for  the  two  error  processes  that  result  when  a signature  of  each  class  is  applied  to  its  corre- 
sponding filter.  That  the  quadratic  classifier  for  the  error  processes  can  be  expressed  as  a 
sum  of  outputs  from  the  incremental  classifier  as  in  Eq.  (18a)  is  a result  of  the  fact  that  the 
error  e k is  uncorrelated  with  the  previous  errors  £k_1,  £k_2»  etc.  This  fact  in  turn  follows 
from  the  orthogonality  principle  expressed  in  the  theorem.  The  classifier  is  optimal  for 
Gaussian  input  processes  and  will  be  further  interpreted  for  general  random  processes  in 
Sec.  IV -C.  However,  let  us  first  consider  how  the  matrix  Gk  that  defines  the  linear-predictive 
filter  can  be  more  easily  computed  when  the  processes  are  univariate  (i.e.,  when  the  signatures 
consist  of  the  PP  amplitude  only)  and  stationary,  and  the  number  of  pulses  observed  per  stage 
is  equal  to  one. 

B.  Recursion  Relations  for  Classifier  Parameters 

A set  of  recursion  relations  for  the  computation  of  the  classifier  parameters  was  described 
in  Sec.  II.  The  recursion  consists  of  Eqs.  (10)  through  (12)  with  starting  value  E^  = K^.  For 
the  special  case  when  the  processes  to  be  classified  are  univariate,  and  a single  time  sample 
(radar  return)  is  observed  at  each  stage,  we  have  nAp  = 1.  The  matrices  Bk  and  G^  defined 
in  Sec.  II  then  become  column  matrices  or  "vectors"  and  the  matrices  Dk  and  degenerate 
to  scalars.  For  this  special  case,  the  variables  will  be  represented  by  lowercase  symbols 
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— k'  -Sk*  ^k'  and  ek*  recurs*on  relations  Eqs.  (10)  through  (12)  can  be  written  in  this 

notation  as 


K-i  i 0 


h* 


Ik'Kk-ifek  <3 

• <3 

If,  further,  the  random  processes  are  stationary,  the  covariance  matrix  has  the  special 
Toeplitz  form 


R(0)  R(l) 
R(l)  R(n) 


R(k-l) 


R(k-l) 


R(0)  R(i) 
R(D  R(0) 


where  R(i)  represents  the  correlation  function  for  the  process  evaluated  at  lagi.*  If  Eq.  (37)  is 
compared  with  Eq,  (9b),  one  can  observe  that 

f R<k  ~ 1)1 


and  that 


dk  = R(0) 


Let  us  now  define  the  reversal  of  a vector  as  another  vector  whose  components  are  the 
components  of  the  original  vector  in  reverse  order.  For  the  vector  b^>  one  has 


— k = rev  ^k 


R(k  -1) 


Observe  that  since  the  process  is  zero-mean  the  covariance  is  equal  to  the  correlation. 
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from  which  it  follows  that 


R(k-l) 


(41) 


In  oi  !.:..r  to  derive  a simplified  set  of  recursion  relations,  Eq.  (3  5)  will  now  be  written  in  a form 
involving  the  reversals  of  a^d  kk-  Equation  (3  5)  can  be  written  in  the  equivalent  form 


Kk-l£k=bk  • 


(42) 


If  both  vectors  in  Eq.  (42)  are  replaced  by  their  reversals,  then  in  order  to  maintain  a true 
relation,  the  matrix  must  be  replaced  by  one  obtained  from  itself  by  reflecting  it  first 
about  the  main  diagonal  and  then  about  the  reverse  diagonal.  However,  because  Kk-1  has  the 
special  form  (Eq.  (37)]  (i.e.,  it  is  symmetric  and  Toeplitz),  these  two  operations  leave  it  un- 
changed. Therefore,  one  has  the  relation 

Kk-i£k  = £k  <«> 

and  thus 

£k  = Kk-i£k  • (44> 

The  desired  simplified  recursion  now  follows  directly,  because  from  Eqs.  (44),  (34),  and  (41) 
one  can  write 


K,"1,  i 0 
k-2  i 

, 1 

■Sk-1 

1 , 

1 

t ; l 

£k-i 

■Sk 

i 

» 

• 

o ! o 

1 

1 

,R(k  -1)_ 

Kk-2  I 0 

5k-i 

+ 

~£k-l 

(R(k-l)  -ij^b 

_t 

ei  A 

k-1 

0 : 0 

_R(k  -1)_ 

1 

By  employing  Eq.  (44)  once  again  and  taking  the  reversal  of  the  result,  we  obtain 


0 

1 

+ 

£k-l 

“£k-l 

where 

Pk»m(k)-sJ'bkJ/ck 


(45) 


(46) 


(47) 
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and  where  from  Eqs.  (3  5),  (36),  and  (39)  we  have 

et  = H(0>-i£bk  . 118) 

Equations  (46)  through  (48)  represent  a concise  form  of  the  recursion  relations  to  compute  the 

filter  coefficients  for  a univariate  stationary  process.  These  are  identical  to  the  recursion 

7-14 

relations  derived  by  Levinson  and  others. 

Most  of  the  literature  related  to  linear-predictive  filtering  and  the  associated  statistical 
models  defines  the  vector  of  filter  coefficients  as  the  reversal  of  j[  A comparison  of  our 
terminology  to  that  which  is  most  common  in  the  literature  is  made  in  Table  IV.  Equations  (46) 
through  (48)  can  be  written  in  the  alternative  terminology  as 


^p+1 


[R(P  + 1) 


p^/ep+l 


ep+l 


R(°)  - Up  £p 


(49) 

(50) 

(51) 


The  variable  p.  is  the  so-called  reflection  coefficient  in  linear-prediction  theory  or  partial  cor- 
relation coefficient  in  statistics.  An  engineering  interpretation  of  the  reflection  coefficient  is 
given  in  Refs.  12  and  14.  By  substituting  Eq.  (34)  in  Eq.  (36),  one  can  show  that 


~ ek-l^  p k-l  ^ 


k-1' 


(52) 


which  can  be  used  as  an  alternative  to  Eq.  (48)  or  Eq.  (51)  in  the  recursion.  Since,  by  virtue  of 
Eq.  (33),  ek  represents  the  mean-square  error  in  prediction  of  the  process  using  k — 1 previous 
observations,  ek  will  always  be  greater  than  or  equal  to  zero.  Equation  (52)  thus  implies  that 
|pk|  must  be  less  than  or  equal  to  one,  which  in  turn  implies  that  the  mean-square  error  is  a 
decreasing  function  of  k.  This  is  the  key  justification  for  the  use  of  the  sequential  classifier 
in  the  extended  mode.  When  a number  of  decision  stages  kg  or  a prediction  order  — 1 is 
reached  such  that  the  mean-square  error  "levels  off"  and/or  approaches  zero,  then  there  is  no 
need  to  consider  any  higher  orders  of  prediction.  In  terms  of  classical  statistical  models  one 
can  say  that  the  signatures  are  adequately  modeled  by  an  autoregressive  process  of  order  k — 1. 
Autoregressive  processes  are  discussed  in  various  references  in  the  statistical  literature 
(e.g.,  Refs.  5 and  8);  such  processes  are  represented  exactly  by  a weighted  sum  of  some  num- 
ber p of  their  past  values  plus  white  noise.  Equation  (52)  implies  that  since  the  mean-square 
error  of  prediction  of  a process  is  a decreasing  function  of  k,  one  can  model  an  arbitrary 
discrete -time  random  process  to  any  desired  degree  of  accuracy  by  an  autoregressive  process 
of  finite  order  p.  This  implication  can  be  further  interpreted  when  the  results  are  presented 
from  a spectral  analysis  point  of  view. 
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|'U-»-4M4  | 


Fig.  5.  Prediction  branch  of  sequential 
classifier. 


INNOVATIONS  FILTER  Hk<»)  * 1 - Gk(i) 
I 1 


C.  Relations  to  Spectral  Analysis 

Figure  5 depicts  one  branch  of  the  sequential  classifier.  The  linear-predictive  filter  has 
been  represented  in  the  frequency  domain  by  its  z transform 


°k‘*>=Sk 


-k+1 


-k+2 


-1 


(53) 


variable  equal  to  e-'27r^1'  and  T is  the  time  between 
repetition  interval).  Note  that  when  the  input  x^  is  the 
predict,  the  output  process  ek  is  uncorrelated,  i.e., 

; j*  ks 

; k>kg  . (54) 

This  fact  has  been  discussed  extensively  in  the  literature  and  follows  directly  from  the  theorem 
in  Sec.  IV-A.  During  the  period  when  the  classifier  is  used  in  the  normal  mode,  the  filter 
is  time  varying  (i.e.,  the  coefficients  depend  on  the  time  index  k)  and  the  error  has  a flat 
spectrum  whose  level  is  e^.  When  the  classifier  is  used  in  the  extended  mode,  the  filter  be- 
comes time  invariant  with  prediction  coefficients  g,  and  the  error  process  e,  becomes  (to 

Ks 

any  desired  degree  of  approximation)  a white  process  with  spectral  level  e,  . If  the  input 

Ks 

process  is  Gaussian,  then  the  error  process  is  also  Gaussian  and  is  referred  to  as  the  jn- 

novations  process  of  x^.  The  filter  H(z)  = 1 — G(z)  (shown  within  dotted  lines  in  Fig.  5)  that 

generates  the  error  from  the  input  is  called  the  innovations  filter.  The  innovations  process  is 

1 5 

defined  as  a white  Gaussian  process  obtained  through  a causal  and  causally  invertible  filter 

[this  is  clearly  the  case  for  H(z)].  The  importance  of  the  innovations  filter  in  general  is  that  it 

can  be  used  in  the  inverted  form  to  generate  the  input  process  from  white  Gaussian  noise.  This 

12-14 

technique  has  been  used  extensively  for  the  artificial  generation  of  speech,  and  has  recently 

been  studied  in  connection  with  the  generation  of  ei  .nbles  of  radar  signatures  to  approximate 
live  flight  test  data^ 


where  z is  the  discrete-time  frequency 
samples  of  the  process  (the  radar  pulse 
process  which  the  filter  was  designed  to 


Efekek-jl 


ek6k,k-j 
ek  5k.j 
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The  innovations  filter  provides  an  approach  to  obtaining  an  estimate  of  the  spectrum  of  the 
input  process.  Since  the  output  process  is  white  noise  with  spectral  level  ek,  one  has 


Sk(f) 


fit 

|i-Gk(e^fT)|2 


(55) 


which  implies  that  the  process  x^  can  be  represented  to  any  desired  degree  of  approximation 
by  a system  function  with  k poles.  Equation  (55)  is  identical  to  the  so-called  maximum  entropy 
spectral  estimate.  The  maximum  entropy  spectral  estimate  is  the  spectrum  of  the  "most 

random"  Gaussian  process  (i.e.,  that  with  maximum  entropy)  whose  correlation  function  matches 
the  correlation  function  of  the  process  x^  on  the  lag  interval  (-k  + 1,  k - 1]  (Refs.  11  and  19). 
Another  spectral  estimate  that  will  be  of  interest  is  the  maximum  likelihood  spectral  esti- 

Ck  If 

mate  Sx  (f)  (Refs.  20  and  21)  which  can  be  interpreted  as  the  power  output  of  a narrowband  causal 

filter  of  duration  k centered  at  frequency  f that  minimizes  the  power  due  to  frequencies  other 

than  f (Ref.  21).  It  can  be  shown  that  the  maximum  likelihood  and  the  maximum  entropy  spectral 
11  22 

estimates  are  related  ' as 


1 _ y 1 

5 Lj  -i 

s>  j.,  s'(f) 


(56) 


The  spectral  analysis  methods  provide  an  important  interpretation  of  the  classifier.  In 
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particular,  from  the  definition  of  the  power  spectral  density  for  discrete  time  signals,  one 
can  write 


2 pl/2T  k p 

[(€k)1=TJ  S*(f)  df  = Tj 


-1/2T 


1/2T 

•1/2T 


1 -G.  (ej2irfT)|2  S (f)  df 


(57) 


where  Sx(f)  is  the  true  spectrum  of  the  input.  Observe  that  Eq.  (57)  holds  regardless  of  the 
input  process;  it  is  not  necessary  to  assume  that  the  input  is  chosen  to  make  a white  process. 
The  classifier  output  h^,  given  recursively  by  Eqs.  (18),  can  be  expressed  using  Eqs.  (19a)  and 
(1 5)  as 


k <e/>2 


k <c2)2 


l l -+T  +ln 


K, 


3=1  ej 


. . e. 
3=1  3 


K, 


k^  kc 


(58) 


When  the  classifier  is  used  in  the  extended  mode,  its  output  is  given  by 


hk  = hk  + 

s e 


1 v ,1.2  1 

1 h ) 2 


k j=k  +1 
s J s 


'k  j=k  +1 
s J s 


Z (ej2)2  +(k-ks)ln| 


k > k 


(59) 


By  applying  Eqs.  (55)  through  (57)  to  Eqs.  (58)  and  (59),  one  can  express  the  mean  value  of  the 
classifier  output  as 


W‘|sTVt 


pl/2T  Stf) 

If 

Sj(f) 


df  -T 


ri/2T  s (f)  Ik1! 

' -2 — df  + In  *r 


J1/2T  S,  (f) 


i K, 


k 4 k 


(60) 
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and 


E[hk]  > E[hk  ] + (k  -kg) 
s 


* 1/2T 
-1/2T 


ML 

A k 

Si  8 (£) 


df  -T 


»l/2T 

-1/2T 


sx(f) 

o kg 

S2s(f) 


k > k 

s 

(61) 


^ If  ^ If  * 1/  A 

where  (f)  and  (f)  are  the  maximum  likelihood  spectral  estimates  and  (f)  and  (f)  are 
the  maximum  entropy  spectral  estimates  for  the  two  processes  to  be  identified.  Equations  (60) 
and  (61)  show  how  the  spectra  of  the  processes  are  employed  by  the  sequential  classifier  to 
perform  the  classification.  When  the  classifier  is  used  in  the  normal  mode  [Eq.  (60)],  the  mean 
classifier  output  involves  a comparison  of  the  spectrum  of  the  input  process  to  the  maximum 
likelihood  spectral  estimates  of  each  of  the  two  classes  of  processes  to  be  identified.  When  the 
classifier  is  used  in  the  extended  mode  [Eq.  (61)],  the  mean  output  involves,  in  addition,  a com- 
parison of  the  input  process  spectrum  to  the  maximum  entropy  spectral  estimates  of  the  pro- 
cesses. Clearly,  if  the  classifier  is  used  in  the  extended  mode  for  a large  number  of  decision 
stages  k » kg,  the  mean  classifier  output  will  depend  more  on  the  maximum  entropy  than  on 
the  maximum  likelihood  spectral  estimates.  Since,  in  general,  the  maximum  entropy  estimate 
provides  a higher  degree  of  resolution  than  the  maximum  likelihood  estimate,  use  of  the  clas- 
sifier in  the  extended  mode  should  be  more  sensitive  to  the  fine  structure  of  the  speccra.  On 
the  other  hand,  since  the  maximum  likelihood  method  tends  to  produce  a smoother  spectral 
estimate  whose  amplitude  more  accurately  represents  the  true  power  level,  use  of  the  classifier 
in  the  normal  mode  should  be  more  sensitive  to  the  overall  power  level  of  the  spectra.  In  addi- 
tion, if  the  random  processes  are  ergodic,  then  for  k » k , the  actual  value  of  the  classifier 

s 

output  becomes  proportional  to  the  difference  of  the  integrated  spectral  quotients  expressed  in 
Eq.  (61).  That  is 


hk*  (k_ks)  T \ 


■ 1/2T 


-1/2T 


Sx<f> 

s*s  (f) 


C 1/2T 

df-T 


sx(f) 


-1/2T 


“ k 

S2s(f) 


df  + In 


k » k 


(62) 

To  see  this,  observe  that  if  the  processes  are  ergodic,  then  for  k » kg 
k 

£ (e.i)2-(k-kg)  EKe.1)2]  • (63) 

j=kg+l 

By  using  Eq.  (63)  in  Eq.  (59)  and  applying  Eqs.  (55)  and  (57),  one  obtains  the  desired  result 
[Eq.  (62)]. 


V.  SUMMARY 

This  report  describes  a sequential  form  of  the  quadratic  classifier  (the  optimal  decision 
rule  for  Gaussian  random  processes)  and  its  application  to  the  discrimination  and  classification 
of  radar  signatures.  A special  formulation  of  the  classifier  shows  that  each  decision  stage  con- 
sists of  two  steps;  namely  (a)  linear  prediction  of  observations  to  be  made  at  the  given  stage  and 
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(b)  application  of  the  error  between  the  actual  and  predicted  observations  to  an  incremental  clas- 
sifier to  make  a classification  decision. 

The  storage  and  computational  requirements  for  the  sequential  form  of  classifier  are  the 
same  as  those  for  a non -sequential  quadratic  classifier  when  both  are  applied  to  signatures  with 
the  same  number  of  returns,  and  classification  results  are  identical.  However,  the  sequential 
classifier  has  the  ability  to  classify  signatures  by  observing  only  a relatively  small  number  of 
returns  and  so  can  provide  a definite  real-time  advantage. 

In  addition,  the  sequential  classifier  can  be  C'j-  -ated  in  an  "extended  mode"  to  classify 
signatures  with  more  returns  than  those  used  to  dt  ’ n the  classifier.  As  a result,  the  se- 
quential quadratic  classifier  can  approximate  the  performance  of  a non -sequential  classifier  at 
a fraction  of  the  storage  and  computational  requirements.  Since  the  sequential  classifier  rap- 
idly classifies  easily  identifiable  signatures  and  extends  its  observation  time  on  more  difficult 
signatures,  it  makes  better  use  of  the  radar  resources  and  can  achieve  a higher  overall  level 
of  performance. 

The  sequential  classifier  as  formulated  here  is  intimately  related  to  estimation-theoretic 
results  in  mean-square  filtering,  autoregressive  time  series  analysis,  and  innovations  process 
concepts.  In  particular,  the  linear  prediction  step  produces  the  optimum  linear  mean-square 
estimate  of  the  signature  based  on  the  previous  observations  regardless  of  the  process  statistics; 
when  the  input  to  the  classifier  is  a signature  from  one  of  the  classes  that  it  was  designed  to 
recognize,  then  the  prediction  error  process  for  that  signature  is  white  noise.  The  incremental 
classifier  is  the  basic  recursive  component  in  a classifier  designed  to  optimally  discriminate 
the  two  white  Gaussian  noise  processes. 

The  classifier  can  be  related  to  some  recent  results  in  spectral  estimation.  These  relations 
provide  an  interpretation  of  the  classifier  in  the  frequency  domain. 
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APPENDIX  A 

COMPUTATION  OF  THE  CONDITIONAL  DENSITY 
FOR  GAUSSIAN  OBSERVATIONS 

The  purpose  of  this  appendix  is  to  show  that  the  density  for  Ax,k_^  conditioned  on  x k_^  is 
Gaussian  and  given  by  Eq.  (i  6)  whenever  the  observation  vector  x.k  is  Gaussian.  To  show  this, 
it  is  necessary  to  prove  Eqs.  (13)  and  (15). 

To  prove  Eq.  (13),  one  first  obtains  the  inverse  of  the  covariance  matrix  in  partitioned  form 
by  an  adoption  of  the  well-known  Gauss-Jordan  technique^  to  partitioned  matrices.  One  starts 
with  the  relation 

KkKk_1=I  (A-l) 

and  performs  certain  "elementary  operations"  on  both  sides  of  the  equation  in  order  to  reduce 
Kk  to  the  identity  matrix.  At  the  j**1  step  in  the  process  one  has  the  relation 

R^K^1  = A*^  (A-2) 

where  R^  is  the  reduced  covariance  matrix  after  performing  some  number  of  elementary  op- 
erations, and  A^  is  the  matrix  resulting  from  performing  those  same  operations  on  the  identity 
matrix.  When  has  been  reduced  to  the  identity  matrix,  A^  is  the  desired  inverse. 

Table  A-I  lists  the  steps  involved  in  deriving  the  partitioned  inverse  covariance  matrix. 

-1 

First,  the  upper  row  of  partitions  is  multiplied  by  Kkl>  then  the  upper  row  is  multiplied  by 
B|J  and  subtracted  from  the  lower  row,  and  so  on.  The  resulting  partitioned  inverse  is  shown 
in  the  last  step  and  repeated  here  using  the  matrices  defined  by  Eqs.  (11)  and  (12). 


+ GkEk'‘GkT 


(A-3) 


Equation  (A-3)  can  be  written  in  the  equivalent  form 


(A-4) 


By  applying  Eq.  (5b)  to  Eq.  (8)  and  employing  Eqs.  (A-4), 
result 


(3),  and  (9a),  one  arrives  at  the 
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TABLE  A-I 

OPERATIONS  TO  DERIVE  PARTITIONED  INVERSE 
OF  COVARIANCE  MATRIX 
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ACik  = “mk)T  Kk1(2Sk  ffik'  (xk-l  ~m  k-l)T  ^-l^k-l  — k-l) 


= (-GkT(xk.i  -ffik_4)  +(Axk-Amk)]TEk‘1[-GkT(xk_1  - fflw)  + !tok-Amk)] 


which  proves  Eq.  (13). 

To  prove  Eq.  (15),  note  from  Eq.  (A-2)  that  at  any  step  j the  determinants  satisfy 


| R(^|  • 1 Kj~ 1 1 = 1 A^j  . 


If  one  evaluates  this  at  the  third  or  fourth  step  in  Table  A-I,  one  can  obtain  the  desired  result. 
In  particular,  for  j = III  one  has 

I Ek^  * I Kk 1 1 = 1 Kk-l  1 (A-7 


which  proves  Eq.  (15).  Equation  (15)  when  substituted  in  Eq.  (7)  leads  to  the  conditional  density 
given  by  Eq.  (16).  The  additional  result,  Eq.  (13),  shows  that  this  density  is  Gaussian. 
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APPENDIX  B 

PARTITIONED  SYMMETRIC  STORAGE 

The  parameter  matrices  P*  and  P^  of  Eq.  (21)  are  stored  in  a special  form  that  will  be 
called  "partitioned  symmetric"  form.  The  partitioned  symmetric  storage  form  permits  the 
partitions  of  a symmetric  matrix  to  be  conveniently  accessed  for  processing  in  FORTRAN.  The 
storage  mode  is  in  the  form  of  a large  linear  array  and  is  completely  compatible  with  the  stor- 
age modes  used  for  subroutines  in  the  IBM  Scientific  Subroutine  Package.^ 

Tc  access  a partition,  one  needs  to  have  a pointer  (index)  to  the  first  word  in  the  partition. 
Partitions  along  the  main  diagonal  are  stored  in  symmetric  form;  that  is,  only  the  elements 
above  and  including  the  main  diagonal  are  stored  (by  column).  Partitions  off  the  main  diagonal 
are  stored  in  general  form;  that  is,  all  their  elements  are  stored  (by  column).  This  is  illus- 
trated in  Fig.  B-i.  Figure  E-2  illustrates  the  difference  between  symmetric  storage  and  par- 
titioned symmetric  storage  for  a 6 x 6 matrix  with  the  indicated  elements  and  partitions. 
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Fig.  R-l.  Partitioned  symmetric  storage. 


Fig.  B-2.  Comparison  of  symmetric  and 
partitioned  symmetric  storage  forms. 
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APPENDIX  C 

PROOF  OF  THE  ORTHOGONALITY  PRINCIPLE 

The  proof  of  the  theorem  stated  in  Sec.  IV -A  is  given  here.  The  proof  follows  that  of 
25 

Papoulis. 

Theorem:  Let  the  estimate  Axk  be  defined  so  that  the  error  e k is 
orthogonal  to  the  observations  xk-1,  i-e.,  E[g  is  a matrix  of 

zeros.  Then  Ax k minimizes  Eq.  (26)  and  the  minimum  mean-square 
error  is  given  by 

&min=E[£k^k]  ‘ 

Proof:  First  note  that  since  e k is  orthogonal  to  x » Lk  is  also 
orthogonal  to  any  linear  transformation  of  xk-1.  That  is,  since 

E[ikxk_1]=|  0 |,  then  E[e  k(Axk_1)T]  = | 0 j for  any 

matrix  A.  Now  let  H be  any  matrix  of  predictor  coefficients  and 
let  G be  the  matrix  that  results  in  e k orthogonal  to  xk_,.  The 
prediction  error  using  H is  thus  given  by 

^k-H^k-l  =*k  + <G-H)T*k-l 

where  ik  is  the  error  that  results  when  G is  used.  Since  lk  is 
orthogonal  to  any  linear  transformation  of  xkl  [and,  in  particular, 
the  transformation  (G  — H)^],  the  mean-square  error  is  given  by 

E[lAxk-HTxk_1|2J  =E[Uk|2)  +E[|(G-H)Txk_1|2] 

which  is  minimized  for  H = G.  Further,  since  Lk  is  orthogonal  to 
any  linear  transformation  of  jckl,  one  has 

f'min  = E^k-k^  = E ^ k (A2£k  ~ gT- k-i ^ = E^-k  ^k^ 


Q.E.D. 
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